{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "0cadfd20",
   "metadata": {},
   "source": [
    "// Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.\n",
    "// SPDX-License-Identifier: MIT-0"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d61daf89-0485-47bf-aa96-280577bbadaf",
   "metadata": {},
   "source": [
    "# Fine-Tune Amazon Nova model provided by Amazon Bedrock: End-to-End\n",
    "\n",
    "This notebook demonstrates the end-to-end process of fine-tuning Amazon Nova Lite and Amazon Nova Micro model using Amazon Bedrock, including selecting the base model, configuring hyperparameters, creating and monitoring the fine-tuning job, deploying the fine-tuned model with provisioned throughput and evaluating the performance of the fine-tuned model. \n",
    "\n",
    "Note: The following steps can also be done through the Amazon Bedrock Console"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "7cc3adc2",
   "metadata": {},
   "source": [
    "# Prerequisites\n",
    "\n",
    "- Make sure you have prepared a fine-tuning dataset following the format required [here]( \n",
    "https://docs.aws.amazon.com/nova/latest/userguide/customize-fine-tune-prepare.html)\n",
    "- Make sure your AWS account has appropriate permissions (e.g. access to Amazon Bedrock (us-east-1))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "0147f270-3cf7-4a94-b806-9fc3aabda552",
   "metadata": {},
   "outputs": [],
   "source": [
    "!pip install -qU -r requirements.txt"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ef96a213-b08e-41ec-a987-c1fdd4e3aa0b",
   "metadata": {},
   "outputs": [],
   "source": [
    "# restart kernel for packages to take effect\n",
    "from IPython.core.display import HTML\n",
    "HTML(\"<script>Jupyter.notebook.kernel.restart()</script>\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "dae87342-ea92-4b49-a705-184027a76356",
   "metadata": {},
   "source": [
    "# Setup"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "2e1cbb30-eedc-4079-a8b3-5eb21dd88e53",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "import boto3 \n",
    "from botocore.config import Config\n",
    "import sys\n",
    "import pandas as pd\n",
    "import matplotlib.pyplot as plt\n",
    "import json\n",
    "import time \n",
    "import concurrent.futures\n",
    "import shortuuid\n",
    "import tqdm\n",
    "import os"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "742d8e7f-bb20-4283-9b81-fd67fb15c88d",
   "metadata": {},
   "outputs": [],
   "source": [
    "my_config = Config(\n",
    "    region_name = 'us-east-1', \n",
    "    signature_version = 'v4',\n",
    "    retries = {\n",
    "        'max_attempts': 5,\n",
    "        'mode': 'standard'\n",
    "    })\n",
    "\n",
    "bedrock = boto3.client(service_name=\"bedrock\", config=my_config)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "eedbd33d",
   "metadata": {},
   "outputs": [],
   "source": [
    "## Specify input S3 bucket\n",
    "\n",
    "input_s3_uri = \"s3://ft-aws-domain/ft_bedrock_data/ft_olympus_jsonl/aws_train_olympus.jsonl\"\n",
    "output_s3_uri = \"s3://ft-aws-domain/model_output/ft_nova_lite_aws_v1/\""
   ]
  },
  {
   "cell_type": "markdown",
   "id": "41d127c7-8bee-4cca-86af-cf624a088ad0",
   "metadata": {},
   "source": [
    "# Select the base model to fine-tune"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1556c1dd",
   "metadata": {},
   "source": [
    "You need to provide the `base_model_id` for the model you want to fine-tune. You can find a list of the foundational model ids by invoking the `list_foundation_models` API:\n",
    "\n",
    "``` \\n\n",
    "for model in bedrock.list_foundation_models(\n",
    "    byCustomizationType=\"FINE_TUNING\")[\"modelSummaries\"]:\n",
    "    for key, value in model.items():\n",
    "        print(key, \":\", value)\n",
    "    print(\"-----\\n\")\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "id": "33b0440f-838d-412e-bcf0-bf3ca9050aa8",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "nova_micro_identifier = \"arn:aws:bedrock:us-east-1::foundation-model/amazon.nova-micro-v1:0:128k\"\n",
    "nova_lite_identifier = \"arn:aws:bedrock:us-east-1::foundation-model/amazon.nova-lite-v1:0:300k\""
   ]
  },
  {
   "cell_type": "markdown",
   "id": "782da1f9",
   "metadata": {},
   "source": [
    "Next, provide the `customization_job_name`, `custom_model_name` and `customization_role` which will be used to create the fine-tuning job."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "4ae02645",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Nova model customization currently only available in US EAST 1\n",
    "\n",
    "role_name = # Replace with your role name\n",
    "role_arn = # Replace with your role ARN\n",
    "\n",
    "job_name = \"aws-ft-nova-lite-v1\"\n",
    "model_name = job_name "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f57903f8-0c1c-4039-b398-45182d937174",
   "metadata": {},
   "source": [
    "# Create fine-tuning job"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "13ca815f-255f-4920-87b9-acb36b27094d",
   "metadata": {},
   "source": [
    "<div class=\\\"alert alert-block alert-info\\\">\n",
    "    <b>Note:</b> Fine-tuning job will take around 2-4 hrs to complete.</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8e57db8e-5356-4ccd-8d53-35941d4db999",
   "metadata": {},
   "source": [
    "| ***Parameter Name*** | ***Parameter Description*** | ***Type*** | ***Min*** | ***Max*** | **Default** |\n",
    "| ------- | ------------- | ------ | --------- | ----------- | ----------- |\n",
    "| Epochs | The maximum number of iterations through the entire training dataset | integer | 1 | 5 | 2 |\n",
    "| Learning rate | The rate at which model parameters are updated after each batch of training data | float | 1.00E-06 | 1.00E-04 | 1.00E-05 |\n",
    "| Learning rate warmup steps | Number of iterations over which learning rate is gradually increased to the initial rate specified | integer | 0 | 20 | 10 |\n",
    "| Batch size | The number of samples processed before updating model parameters | integer | NA | NA | Fixed at 1|"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "id": "0ee8a157-fb04-4e9d-beca-bdabfbed39e7",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Select the customization type from \"FINE_TUNING\" or \"CONTINUED_PRE_TRAINING\". \n",
    "customization_type = \"FINE_TUNING\"\n",
    "\n",
    "\n",
    "# Define the hyperparameters for fine-tuning Amazon Nova model\n",
    "hyper_parameters = {\n",
    "        \"epochCount\": \"1\",\n",
    "        \"learningRate\": '0.000001', \n",
    "        \"batchSize\": \"1\",\n",
    "    }\n",
    "\n",
    "\n",
    "response_ft = bedrock.create_model_customization_job(\n",
    "    customizationType=customization_type,\n",
    "    jobName = job_name,\n",
    "    customModelName = model_name,\n",
    "    roleArn = role_arn,\n",
    "    baseModelIdentifier = nova_lite_identifier,\n",
    "    hyperParameters=hyper_parameters,\n",
    "    trainingDataConfig={\"s3Uri\": input_s3_uri},\n",
    "    outputDataConfig={\"s3Uri\": output_s3_uri},\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "27f1137d-3ba8-4d42-af84-6d1e489af009",
   "metadata": {},
   "source": [
    "# Check fine-tuning job status"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "id": "99ccd7de-ae97-45d2-92b1-5f61e4157204",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Job status: InProgress\n"
     ]
    }
   ],
   "source": [
    "jobArn = response_ft.get('jobArn')\n",
    "status = bedrock.get_model_customization_job(jobIdentifier=jobArn)[\"status\"]\n",
    "print(f'Job status: {status}')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "54a8ac36-7e3d-4e51-94ef-dd41cd4fe21d",
   "metadata": {},
   "source": [
    "# Setup provisioned throughput"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c11f6925",
   "metadata": {},
   "source": [
    "Once the job status changes to `complete`, we need to create provisioned throughput which is needed for running inference on the fine-tuned Amazon Nova model. For more information on provisioned throughput, please refer to [this documentation](https://docs.aws.amazon.com/bedrock/latest/userguide/prov-throughput.html)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "71f64472-00ca-4f4f-a4d4-065e2b61684c",
   "metadata": {},
   "outputs": [],
   "source": [
    "provisioned_model_name = 'finetuned_nova_lite'\n",
    "custom_model_id = 'aws-ft-nova-lite-v1'\n",
    "\n",
    "provisioned_model_id = bedrock.create_provisioned_model_throughput(\n",
    "                                        modelUnits=1,\n",
    "                                        provisionedModelName=provisioned_model_name,\n",
    "                                        modelId=custom_model_id\n",
    "                            )\n",
    "\n",
    "print(provisioned_model_id['provisionedModelArn'])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "de9f56bd",
   "metadata": {},
   "outputs": [],
   "source": [
    "status_provisioning = bedrock.get_provisioned_model_throughput(provisionedModelId = provisioned_model_id)['status']\n",
    "\n",
    "import time\n",
    "while status_provisioning == 'Creating':\n",
    "    time.sleep(60)\n",
    "    status_provisioning = bedrock.get_provisioned_model_throughput(provisionedModelId=provisioned_model_id)['status']\n",
    "    print(status_provisioning)\n",
    "    time.sleep(60)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d6a8606f-049a-4c7b-939c-c5c44bfda210",
   "metadata": {},
   "source": [
    "# Delete provisioned throughput"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b9939990",
   "metadata": {},
   "source": [
    "<b>Warning</b>: Please make sure to delete providsioned throughput as there will cost incurred if its left in running state, even if you are not using it."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "4341f7e8-e85b-4c88-9f77-11b2afc463de",
   "metadata": {},
   "outputs": [],
   "source": [
    "bedrock.delete_provisioned_model_throughput(provisionedModelId=provisioned_model_id)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "conda_pytorch_p310",
   "language": "python",
   "name": "conda_pytorch_p310"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.14"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
